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involve hundreds of complex aggregate queries over large volumes of data. It is not feasible to 
compute these queries by scanning the data sets each time. Warehouse applications therefore build a 
large number of summary tables, or materialized aggregate views, to help them increase the system 
performance. As changes, most notably new transactional data, are collected at the data sources, all 
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Which business intelligence company 


guarantees results in 2 days? 


Business Information 
Directories Catalog 
Decision Support 
Information Throughout 
the Enterprise. 

by Richard P. Sherman 


Metadata: 

The Missing Link 


A tremendous amount of resources are being used in enterprises to build data warehouses and data marts. This 
type of decision-support activity is being performed as part of the IT mainstream. Product vendors, systems 
integrators, and consultants are mobilized to help IT in their efforts. But often, after investing much hard work 
and resources, business users are disappointed with the results. Did the IT groups, vendors, and consultants miss 
something? 

Database query tools have proliferated over the past few years. There have been more than 100 of these tools in 
the marketplace at various times. Despite allowing business users access to virtually any database that IT can 
build, these tools have not gained the widespread usage that spreadsheets or word processors enjoy. How do 
business users locate information with these tools? How do they know what the data represents? How do they 
get the information they need? Without being able to answer these questions, business users cannot make 
effective use of these tools or the data warehouse. 

If You Build It, They Will Come. . . 

Expectations for data warehouse projects are established by an initial enthusiastic group of business users. This 
is often reinforced by a successful pilot project with these same users, who raise expectations even further. 

These business users are generally innovators and early adopters in the adoption of technology. Geoffrey Moore 
does an excellent examination of the Technology Adoption Life Cycle in his two books, Crossing the Chasm 
and Inside the Tornado (both published by HarperBusiness in 1995). These business users enjoy exploiting new 
technology as part of their jobs, hoping it will give them an edge in their business. Most business users, 
however, are more pragmatic and will use new technology only if it has been proven to make their jobs easier. 
Furthermore, the new technology must not require a significant investment of time on their part. These business 
users do not assume that it's always necessary to use the latest technology. They reap the technology and 
information that have been harvested for them by the early adopters. 

It is very common for IT to follow the philosophy of "If we build it, they will come." This philosophy is 
reinforced by the "data explorers" who are self-sufficient with new technology and eager to find new 
information assets. Data explorers are users of the various query and OLAP tools who enjoy exploiting new 
technology in their jobs. They delight at the success of finding new pieces of information while using these new 
tools. Data explorers have a disproportionate influence on all parties building data warehouses. They create the 
false expectation that business users will leap at data warehouses and find new, exciting information jewels 
previously locked in data basements (legacy applications to which business users could not or would not gain 
access). Typical businesspeople need some help and support in that endeavor. They will not invest the time in 
the new technology just for the joy of using it. 
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Data Rich, Information Poor 


Initially the Internet was used by a small number of technical people. The World Wide Web and Internet 
browsers expanded its use significantly. However, as the content expanded exponentially, search engines such 
as AltaVista and Yahoo were needed to help people find information. But even the search engines were not 
enough, because inquiries returned thousands of choices. A skeptic once said that the Internet contains all the 
information you will ever need to know but cannot find. Millions of people, however, still use AOL and 
CompuServe, because these services organize the information in a more useful way. Recently, PointCast and 
others have incorporated push technology to broadcast information to users. Users select data published from 
various information channels, which are organized by content. PointCast will then "push” any updated 
information from those channels to users as requested. Both approaches, regardless of their underlying 
technologies, are successful because they offer an organized information catalog for users to browse and select 
information from. 


The Missing Link 

The Business Information Directory (BID) is the missing link needed to open up data warehouses to the 
business community. It is the catalog of information that is available for decision support throughout the 
enterprise. This information includes data warehouses, data marts, OLAP, data mines, workgroup applications, 
and personal analytical databases (spreadsheets). 

The cornerstone of the BID is the "M" word: metadata. (See Figure 1 .) IT personnel cringe and business users' 
eyes glaze over when metadata is mentioned. Metadata, however, is a means to an end — an enabler to the 
desired goal of making decision-support data accessible to the business community throughout an enterprise. 

The two usual approaches to metadata are at opposite ends of the spectrum: It is either ignored or praised with 
zealous fever. If ignored, metadata will proliferate with every tool brought into the data warehouse environment. 

If approached as a "religion," it will focus IT on the wrong issues. The balanced approach is to place it as a 
resource to be harnessed in successful decision-support environments. 

Metadata is data about data. There are two categories of metadata: technical and business. Technical metadata is 
the description of the data needed by various tools to store, manipulate, or move data. These tools include 
relational databases, application development tools, database query tools, data modeling tools, data extraction 
tools, online analytical processing (OLAP) tools, and data mining tools. Business metadata is the description of 
the data needed by business users to understand the business context and meaning of the data. Technical 
metadata has spread like wildfire across the enterprise as more tools and types of tools are used to build 
decision-support systems (DSSs). Business metadata is contained in the business requirements and 
specifications for DSSs. It is often only online in the Word documents used in designing these systems. After it 
is used in the design phase, the business metadata is generally "shelfware" (collecting dust in three-ring binders 
on the business analyst's shelf). 

Business Information Directory Functionality 

The Business Information Directory supports three main functions. First, the BID enables information 
discovery. The business user needs to find out what information is available. Data is worthless if the user does 
not know it is there. In fact, as the amount of data, in terms of the number of data subjects, facts, and dimensions 
you have available, increases, the business users' ability to find what they need generally decreases. 

Decision-support information is located in databases and directories across the enterprise. The BID should be 
the equivalent of Yahoo or AOL in giving the business user a friendly, effective way to find out what 
information is available. 

Second, the BID promotes business understanding. Just knowing that data exists is not enough. What the data 

represents is crucial to business users. They need to determine if the information is pertinent to them and how to 
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interpret it. Terms such as sales and pr^ff can mean vastly different things to va^Ris business groups within an 
enterprise. Business users need to understand the context of the data in order to use it properly. 

Finally, once the business users know the data exists, they want it. They may want to access it now, or they may 
want it delivered to their desktop on a regular basis. The latter would be necessary for them to perform repetitive 
tasks such as weekly or monthly reports. Business users, accustomed to double-clicking on links on a Web page, 
want similar functionality in their decision-support systems. 

Recently, some of the more sophisticated query tools have been created as Managed Query Environments 
(MQE). This is an attempt to make the query tools more business-user friendly by using business terminology in 
developing the queries. An MQE accomplishes this through a semantic layer (metadata) that replaces the 
physical names of tables and columns with views and synonyms with business terms. This can be viewed as a 
limited information catalog. A great enhancement over the earlier generations of query tools that presented 
physical table and column names to end users, MQEs should be a selection criteria when query tools are being 
evaluated. But their semantic layer, or information catalog, is too limited to extend across the data warehouses, 
data marts, and so on that are needed. 

The Users of a Business Information Directory 

The potential customers for the BID are business users and members of the IT groups building and operating the 
data warehouse. (See Table 1 .) The former includes both data explorers and data farmers. Data farmers, 
however, are not interested in using the query tools just for the sake of using new technology. As experienced 
spreadsheet and word-processing users, they use these technologies as tools in their jobs. They harvest the data 
that the data explorers find and turn it into information using spreadsheets to analyze this data. Data explorers 
and IT personnel can find and access data within the data warehouse with various database access and OLAP 
tools. They accomplish this by spending the time looking for the data. However, data farmers cannot exploit 
these tools effectively because of the time requirements. 

The BID'S initial targets are the data farmers of the business community. They need an information catalog they 
can search for information, understand it, and get it. It is important to note, however, that if a BID was available 
to the data explorers and IT personnel, they, too, would benefit because they could exploit the data warehouse 
more effectively. Data explorers and IT personnel, however, may not perceive the need for a BID because they 
think they already have tools to access the data warehouse. 

The target market shapes what functionality the BID offers, which in turn determines what is stored in its 
information catalog. Vendors, consultants, and IT all have the data explorers in mind when considering the need 
for or designing BIDs. Table 2 examines the difference in interpretation of BID functionality between the data 
explorer and data farmer. In fact, from the data explorers' point of view, an information catalog may not be as 
critical because they are willing to search for information on their own. However, as previously noted, data 
explorers would benefit significantly from a BID. 

The BID serves two purposes for the data farmer. First, it acts as the librarian who researches what information 
is available and pertinent for the business user. Second, it is a mail-order catalog from which business users can 
order the information to arrive when they need it. This latter purpose is similar to PointCast in that business 
users want the information delivered to their desktops to use in their work. 

BID Components 

The BID is composed of four components and interfaces. (See Figure 2 , page 78.) These include the Information 
Navigator, Information Catalog, Administrator, and the Information Delivery Agent. Most products include the 
first three components, but not all products currently implement an Information Delivery Agent. 

The Information Navigator is the business user interface. It provides the navigation, understanding, and access 
3 oflO 9/2/02 10:45 AM 



DBMS - August 1997 - Metadata: The Missing Link wysiwy^//126/http://www.dbmsmag.com/9708d 16.html 

functionality for the BID. It interacts wfiRhe other BID components, as well as mBking various tools to access 
and manipulate information by the business user. This is the business user's view into data warehouses, data 
marts, workgroup databases, and personal databases. 

The Information Catalog is the brains of the BID. It stores the metadata needed to provide BID functionality. 

Various import and export facilities as well as APIs are used to move metadata between different metadata 
sources and the BID. 

The Administrator is a superset of the Information Navigator. IT also uses this interface for BID administration. 
These functions include maintaining the Information Catalog, managing business users access capabilities, 
maintaining security, and updating metadata not handled by the Import/Export capabilities. 

The Information Delivery Agent moves the information requested by business users to their desktop or 
workgroup applications. This is equivalent to a push model in which the business user requests information to 
be delivered and it is published onto the user's desktop. 

Market Observations 

The Business Information Directory market is very immature. According to Crossing the Chasm, we are 
currently in the Innovators and Early Adopters stages of the Technology Adoption Life Cycle, and we have been 
in these stages for a few years. Only a handful of products on the market today are very new. Many innovators 
and early adopters built their own BIDs, which greatly enhanced their data warehouse efforts. Several products 
on the market are the result of IT internal projects or consulting engagements trying to transform these efforts 
into commercial products. 

It is also a poorly understood market. Most vendors do not understand what the business users' needs really are. 
Vendors usually work with IT groups and therefore view the need for a BID through IT's eyes, which leads to a 
belief that users simply want access to databases. But this functionality is just the means to an end. The real 
objective is information access, which means finding and understanding the information in business context but 
not how a database administrator would find it. In addition to the vendors, IT also does not fully appreciate the 
extent of the problems and needs. Most IT people are too busy to deal with metadata. Because of the 
ever-increasing pressures to deliver projects quickly, items that do not have a perceived immediate impact, such 
as metadata, are postponed. And those IT groups that do not postpone dealing with metadata are frustrated by 
vendor solutions that are, at best, partial solutions addressing a limited set of metadata sources. 

BIDs are also very diverse in nature. Most BIDs were created during specific customer engagements or as 
add-ons or extensions to existing product lines. The products from Prism Solutions Inc., Platinum Technology 
Inc., IBM Corp., Logic Works Inc., and Virtual Integration Technology Inc. were all initially built under these 
circumstances. As such, they address the particular metadata integration needs encountered for that specific 
engagement or product line. The resulting BIDs need to be expanded to meet the wide variety of environments 
encountered in the general marketplace. In addition, the engagements in which the BIDs were created were 
consulting or specific IT projects, with a lot of personal attention paid to tailoring them to be successful. With 
the move to a commercial product, the extensive consultative support is eliminated, and implementation success 
is greatly diminished. 

The Market 

The products available in the market that I will discuss are: 

• Prism Warehouse Directory 

• Platinum Data Shopper 

• IBM DataGuide 

• Logic Works Universal Directory 
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Virtual Integration Technology del^ryMANAGER 


Prism Warehouse Directory 

The BID with the most market visibility is the Prism Warehouse Directory (PWD). Prism, founded by Bill 
Inmon, helped define and expand the data warehouse market. The company's main product is Prism Warehouse 
Executive (PWE), the revision to Prism Warehouse Manager, which addressed building data warehouses 
through extracting, transforming, and loading them from legacy systems. This process involved mapping source 
and target systems with code being generated to do the previously mentioned tasks. Because all of the metadata 
to support these operations was input into the tool's data store, metadata documentation and management were 
provided. 

The Prism Warehouse Directory was a natural extension of the Prism Warehouse Executive - a great deal of the 
technical metadata for the BID was already available. The initial releases of the PWD were geared toward IT 
and data explorers and oriented toward the physical aspects of storage and transformation between sources, 
which was the purpose of the PWE. At that time, the BID was a totally passive catalog; users found references 
to the information they desired, wrote down where it was located, and then went into other tools to access the 
data. 


This BID has progressed significantly since its inception. Prism has partnered with several vendors to create 
import and/or export capabilities with repository, CASE, data modeling, and MQE tools. This greatly expands 
the metadata available in the information catalog. In addition, Prism has added the capability to launch 
applications once information is located. This moves the BID from a passive to an active catalog. Prism 
Warehouse Directory Web Access allows Web access to the BID and expands access to data by enabling users 
to build and launch queries to databases. 

The Prism Warehouse Directory has been installed by approximately 100 companies. It has three components: 
Directory Builder (administrative tool), Directory Navigator (end-user tool), and the Information Directory. It 
can be purchased standalone at $50,000 with five Navigator seats or bundled with the Prism Warehouse 
Executive. Almost all purchases of PWD are bundled with PWE. 

Although it has made great strides in expanding its audience, PWD is still centered around the sourcing of data 
into data warehouses or data marts. This is a key application of metadata, but it is still technically oriented and 
will appeal to IT and data explorers. If you are already a PWE customer, it is natural to utilize PWD. If you are 
not using PWE, you should evaluate other options. 


Platinum Data Shopper 

In my view, Platinum Technology's Data Shopper has the largest market share of the commercial BIDs. This 
product was acquired through Platinum's purchase of RelTech in 1995. Data Shopper uses the Platinum 
Repository (an integration of the repositories from RelTech and BrownStone Solutions) as its information 
catalog. Most of the installed base, which is approximately 300 sites for the Platinum Repository, has purchased 
Data Shopper. 

The metaphor used is that of file cabinets and folders. Information content is organized into "file cabinets," 
which are logically business subjects or topics. These are further divided into business categories. Business 
rules, logic entities, data structures, data elements, and data usage tabs are also provided. 

Data Shopper is marketed as a tool for business users to browse and understand what is contained in a data 
warehouse (via a repository). Business users can find information that they might not have otherwise known 
existed. They can identify, understand, and locate objects such as database tables and columns, queries, reports, 
spreadsheets. Word documents, application programs, and other information stored in repository. 
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Data Shopper lists for $500 per seat, xvm - volume discounts applying. However, matinum Repository is required 
for the information catalog. The MVS version will easily sell for more than $100,000, and the Open Edition will 
approach $100,000 when loaded with various options. So the cost of admission is more than $100,000 and 
buying into the use of Platinum Repository. The merits of repositories in general and Platinum's in particular are 
beyond the scope of this article. If you have the Platinum Repository, you should implement Data Shopper. If 
not, then first consider whether you should purchase Platinum Repository on its own merits. 

IBM DataGuide 

IBM's DataGuide is sometimes lost in the large amount of the company's product offerings. DataGuide, sold 
both on its own and bundled with IBM Visual Warehouse, was limited initially to "IBM shops," with its first 
offering being OS/2 only and requiring DB2/2. It has now been released on Windows 95/NT and should offer 
Web-based access in the future. 

DataGuide provides business users with an information catalog containing metadata about both structured 
(databases) and unstructured (files) data. This data is treated as an information object and can be grouped 
together in a variety of ways. The information catalog is extensible, with the capability to add different types of 
objects. Import and exports are achieved through published APIs or through a published command language 
interface. Initially, the only metadata exchange occurred among DB2 family products, but partnerships with 
market-leading OLAP and MQE vendors have expanded this capability. 

DataGuide consists of three tools: DataGuide User, DataGuide Administrator, and Information Catalog. The 
User interface presents a tree structure of objects that the business user expands to get the contents of folders or 
more details. Business metadata and help are available on each object. Once information has been found, the 
business user can launch an application to access that information. 

DataGuide has been installed at approximately 100 companies. It costs $209 for the User tool and $1,149 for the 
Administrator tool; volume discounts apply. In addition, a version of DB2 on NT, OS/2, or MVS must be 
purchased for the Information Catalog. This is the lowest-cost tool examined in this article, but that does not 
equate to usefulness or functionality. The only prerequisite that may hinder its implementation is the use of 
DB2/NT or DB2/2 for its Information Catalog. It would be more robust if the other major relational databases 
were also offered. But the cost of DB2/x is low and its use is limited (note: the data warehouse can be in any 
relational database, it is just the Information Catalog that needs to be in DB2/x), so this should not be a criteria 
to reject this BID. It is well worth the cost to explore this BID as a starting point for implementing BID 
functionality. 

Logic Works Universal Directory 

The Universal Directory was announced on April 1, 1997. Logic Works understands metadata for building 
databases, given its successful track record with the ERwin data modeling tool. This BID evolved from the idea 
of using the models generated during the design phase of your data warehouse or data mart as the base of 
metadata management. This metadata would then be expanded to incorporate more full-featured capabilities. 

Universal Directory uses a three-tier architecture with the following components: Universal Explorer (business 
user interface), Directory Administrator (administration tool), Data Server (manages flow of data between 
clients and information directory), License Server (manages concurrent use of client tools), and the Information 
Directory (stored in Microsoft SQL Server, Sybase SQL Server, or Oracle). ModelMart, which handles the 
model management database (stored with the Information Directory), is also required. Other optional products 
that integrate with these tools are ERwin/Open and ERwin/Navigator (used for viewing and editing data models, 
including star schemas), Micro Focus Revolve (used for scanning legacy data), and Sterling CLEAR:Access 
(query tool used to access a data warehouse). Clients work on Windows 95 or Windows NT while the servers 
work on Windows NT. 
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Universal Directory sells for $30,000 f^lO Navigators, one Administrator, and roe ModelMart. The company 
had at least a half dozen purchases as the product was formally announced. The product is very new and does 
not have extensive metadata import and export capabilities. Logic Works' approach does favor IT and data 
explorers, especially those familiar with data modeling. However, the company has included BID capabilities to 
attract the data farmer. This is definitely a tool to watch and evaluate as it matures. 

Virtual Integration Technology (VIT) deliveryMANAGER 

Virtual Integration Technology's deliveryMANAGER is a BID concentrating on distributing data from a variety 
of decision-support systems (data warehouses, data marts, and so on) and file servers. This BID enables business 
users to find information, place orders for that information, and have it delivered to their desktops. 

The VIT deliveryMANAGER components are delivery AGENT, metaWAREHOUSE, and deli very ADMIN. 

The deliveryAGENT is the Web browser or Windows user interface to the information directory. Both 
structured and unstructured data can be cataloged and delivered. Information is arranged as information objects 
called collections. Business users search for information by subject and topics of interest; they can also obtain 
relevant business metadata. Business users can subscribe to this information and have it delivered to their 
desktops, file servers, email, or Web servers. Data delivery can be based on time or events. 

The metaWAREHOUSE is the information catalog (currently stored in Oracle) that integrates technical and 
business metadata. Both structured and unstructured data can be cataloged. 

The delivery ADMIN is the administrative tool used to manage the information directory. It handles user 
security, registration of all information objects, the building of collections, and monitoring information usage. 

This is implemented on Unix and Windows NT. 

The VIT deliveryMANAGER costs $50,000. VIT is a consulting firm that is transforming itself into a product 
company. It has obtained venture financing but had funded initial product development through consulting 
engagements. deliveryMANAGER has approximately 10 installations. deliveryMANAGER is the only BID 
mentioned that has implemented an information delivery capability in addition to the information discovery and 
understanding functions. It is based on a well-engineered technical architecture and has obtained hands-on 
implementation experienced while developing deliveryMANAGER. It is well worth evaluating, with the biggest 
qualification being the risk level associated with a startup. 

Recommendations 

With all of the resources being used in building data warehouses and data marts, it is imperative to make the 
results of these projects usable by business users. Without this usage, these projects will fail to meet user 
expectations. Implementing a Business Information Directory produces the significant benefit of making the 
information visible, understandable, and available. In short, it can be the difference between success and failure. 

Data warehouse and data mart projects need to incorporate metadata management and BIDs as part of their 
objectives. Even with the immature state of the market, the currently available products offer advantages over 
ignoring these issues and capabilities. Many of the early data warehouse projects built their own BIDs, which is 
still a viable alternative. However, many IT shops today do not have the resources or time to implement their 
own custom-built solutions. 


Richard P. Sherman is currently managing Coopers & Lybrand's New England Data Warehouse/ DSS Practice. 
He can be reached at richard.sherman@us.coopers.com. 
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--This figure shows metadata sources. 


TABLE 1. The Potential Customers fora Business Information Director}' 


Class of User Category 
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TABLE 2. BID Functions for IT vs. Business Users 

BID Functionality 

IT Needs 

Business Users 
(Data Explorer) Needs 

Business Users 
(Data Farmer) Needs 

Information Discovery 

What information is 
available? 

Data sources: databases, 
tables, columns, and 
servers 

Data sources: databases, 
tables, columns, and 
servers 

Lists of predefined 
queries, reports, business 
views 

Business Understanding 

What does the data 
represent? 

, ... £ 

1 $ ^ -V:- ■ ■ •••• 

Data definitions, 
structures, valid domains 
Data mapping: cleanup 
and transformation rules 

r He • :,f 

! ^ ^ | ¥ x 

Business terms, 
definitions 
Data definitions 
Data mapping: cleanup 
and transformation rules 

: . Jfi- 

Business terms, 
definitions 

Algorithms, filters » 

Where did data come 
from, how often updated 
Who is the data expert (or* 
custodian) 

i £. . , .4.. 
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Figure 2. 



—The components and interfaces of a Business Information Directory. 


* IBM Corp., White Plains, NY; 800-426-4968 or 520-574-4600; www.ibm.com. 

* Logic Works Inc., Princeton, NJ; 800-783-7946 or 609-514-1177; www.logicworks.com. 
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Differ 


Wysiwyg;// 1 26/http://www.dbmsmag.com/9708d 1 6.html 


iwm//l 


; www.platinum.com. 


'* Platinum Technology Inc., Oakbrool^Krrace, IL; 800-442-6861 or 630-620-f 

* Prism Solutions Inc., Sunnyvale, CA; 408-752-1888; www.prismsolutions.com. 

* Virtual Integration Technology Inc., Cupertino, CA; 800-255-9520 or 408-255-9512; www.vit.com. 


What did you think of this article? Send a letter to the editor . 
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